Registration has reached capacity. Join the waitlist

The Verifier Tax: Horizon Dependent Safety–Success Tradeoffs in Tool Using LLM Agents

Tanmay Sah (Harrisburg University of Science and Technology), Vishal Srivastava (Johns Hopkins University), Dolly Sah (University of Utah), Kayden Jordan (Harrisburg University of Science and Technology)

Security & Privacy Evaluation & Benchmarking

An empirical study quantifying the 'Verifier Tax'—the persistent reduction in task success rate caused by adding runtime safety enforcement to tool-using LLM agents. The results show a model-dependent Safety-Capability Gap with interaction horizons of 15–30 turns beyond which safety enforcement dominates, giving practitioners concrete guidance on where safety and capability trade-offs bite.

Presentation

Talk

Paper Session 6: Learning & Control

Thursday, May 28 · 4:20 PM – 4:30 PM

Bayshore Ballroom

Poster

Thursday, May 28 · 4:30 PM – 6:00 PM

Carmel

View day schedule

Abstract

We study how runtime enforcement against unsafe actions affects end-to-end task performance in multi-step tool using large language model (LLM) agents. Using \taubench{} across Airline and Retail domains, we compare baseline Tool-Calling, planning-integrated (\triad), and policy-mediated (\triadsafety) architectures with GPT-OSS-20B and GLM-4-9B. We identify model dependent interaction horizons (15–30 turns) and decompose outcomes into overall success rate (SR), safe success rate (SSR), and unsafe success rate (USR). Our results reveal a persistent “Safety-Capability Gap”. While safety mediation can intercept up to 94% of non-compliant actions, it rarely translates into strictly safe goal attainment (SSR < 5% in most settings). We find that high unsafe success rates are primarily driven by “Integrity Leaks,” where models hallucinate user identifiers to bypass mandatory authentication. Recovery rates following blocked actions are consistently low, ranging from 21% for GPT-OSS-20B in simpler procedural tasks to near 0% in complex Retail scenarios. These results demonstrate that runtime enforcement imposes a significant “verifier tax” on conversational length and compute cost without guaranteeing safe completion, highlighting the critical need for agents capable of grounded identity verification and post-intervention reasoning.

Artifacts & Links

                        Authors
                        Tanmay Sah
Harrisburg University of Science and Technology
Vishal Srivastava
Johns Hopkins University
Dolly Sah
University of Utah
Kayden Jordan
Harrisburg University of Science and Technology